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Abstract 

External biasing forces are often applied to enhance sampling in regions of phase space which would 
otherwise be rarely observed. While the typical goal of these experiments is to calculate the potential of 
mean force (PMF) along the biasing coordinate, here I present a method to construct PMFs in multiple 
dimensions and along arbitary alternative degrees of freedom. A protocol for multidimensional PMF 
reconstruction from nonequilibrium single-molecule pulling experiments is introduced and tested on a 
series of two dimensional potential surfaces with varying levels of correlation. Reconstruction accuracy 
and convergence from several methods - this new protocol, equilibrium umbrella sampling, and free 
diffusion - are compared, and nonequilibrium pulling is found to be the most efficient. To facilitate the 
use of this method, the source code for this analysis is made freely available. 
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1 Introduction 



External biasing forces are often applied to enhance sampling in regions of phase space which would other- 
wise be rarely observed. 1 These perturbing forces are applied in both time-dependent, as in single-molecule 
pulling 2 5 and steered molecular dynamics 6 ' 7 experiments, or time-indepedent forms, as in umbrella sam- 
pling simulations. 8 Usually, the direction of bias is chosen to follow the reaction coordinate for an interesting 
process, such as a large conformational change in a biological macromolecule. With judicious reweighing 
of biased ensemble properties, previous workers have reconstructed the potential of mean force (PMF) along 
this bias coordinate. 9-15 

The ability to reconstruct PMFs in multiple dimensions and along multiple alternative degrees of free- 
dom would significantly expand the data analysis tool kit for biased experiments. Although the appropriate 
choice of reaction coordinate is fundamental to kinetic calculations, transition state theory, and the accurate 
separation of distinct microstates, its selection is limited by prior knowledge of the system. It would be 
useful to compute PMFs along alternate potential reaction coordinates, in one or several dimensions, after 
biased data is collected along an initial, possibly arbitary choice. In addition, by using multidimensional 
PMFs, the mutual entropy 16 can be used quantify and compare correlations between modes. Here, I develop 
methods to perform these calculations. They are applicable to both computer simulations, in which system 
properties are completely known, or laboratory experiments in which several properties can be simultane- 
ously measured. 



2 The Potential of Mean Force 

The PMF, Fq(x), along a single coordinate, x, is defined as 

- m{x)+5F] _ I S[x - g (r)]e-WW dr 
J e -P{H (r')} dr> 

where 6F is an arbitary constant, is l/k^T, and Hq(t) is the unperturbed system energy as a function of 
the phase space position r. 

In two dimensions, Fq(x) becomes Fq(x, y) and 5[x — x(r)] is replaced by S[x — x(r)]d[y — y{r)]. 
Generalizing to multiple dimensions, Fq(x) becomes Fq(x, 2/1,2/2, • Vn) and the delta function is replaced 
by S[x — x(r)] Yli=l ^iVi ~~ y«( r )]> wnere n is the number of dimensions. If the PMF encompasses all system 
dimensions, it can simply be referred to as the potential. 



3 Multidimensional PMFs from Nonequilibrium Experiments 

Multidimensional PMFs can be reconstructed from time-dependent biasing experiments by applying the 
nonequilibrium work relation 17 ' 18 in a manner analogous to Hummer and Szabo. 11 ' 12 In a single-molecule 
pulling experiment with the Hamiltonian H(r,t) = Hq{t) + V[r,i], where V[r,t] is the time-dependent 
external bias, the biased probability density function is 



e -f3{H (r)+V[ r ,t]} 
f e -P{H (r')+Vir',0]} dr' 



{5(r - r(t))e~ pWt ) (2) 



where Wt is the accumulated work, Jjj 9H g^' t - dt'. 11 ' 12 

To calculate the PMF along the pulling coordinate, Hummer and Szabo consider the special case that 
the perturbation depends only on the biasing coordinate and time, V(r,i) = V[x(r),i\. They then multiply 
equation |2] by e^ v ^ x >^S[x — x(r)], integrate over all phase space, and use the definition in equation Q] to 
obtain 



e 



-/»*(«) = {6{x _ X (r(t)))e-^ W *- V ^) (3) 
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If we instead multiply equation[2]by a two-dimensional delta function, we then obtain the two-dimensional 
PMF. Similarly, multiplying by a multidimensional delta function will lead to a multidimensional PMF. (Al- 
though this is possible in principle, high dimensionality will lead to greater difficulty with sampling.) In the 
two-dimensional case, the PMF is 

e -f3F (x, y ) = _ x ( r (fy) S ( y _ y( r {t)))e-^ Wt - v ^ t]) ) (4) 

In any particular time slice, the position will be more highly sampled near the minima of the bias po- 
tential. To improve accuracy over the entire range of x, data from different time slices can be combined by 
adapting the weighted histogram analysis method (WHAM). 9 After unbiasing with respect to the accumu- 
lated work as well as the time-dependent potential, 11 ' 12 the resultant PMF is 

V (6(x-x(r(t)))S(y-y(v(t)))e- 0Wt ) 

F (x, y) = -(3- 1 In <% v J t) (5) 



4 Correlation 



The usefulness of reconstructing y from experiments biased along x is expected to be related to their degree 
of correlation. One standard measure for the correlation of two coordinates with known probability density, 
f(x, y), is the mutual entropy, 16 

Smutuai = J J f(x,y)log ' dx dy (6) 

where f(x) and f(y) are the marginal probabilities along x and y. As a reference, let f*(y) be a 
probability density estimate based on a histogram of sampling along y without any correction for external 
bias along x. Let the estimate of the PMF using f*(y), F(y) = —k Tln(f*(y)), be known as the naive 
estimate. If x and y are completely independent, f(x,y) = f{x)f{y), f(y) = f*(y), and the mutual 
entropy is zero. In this case, a naive estimate of the PMF along y should be as good as a reconstruction from 
equation [5] If x and y are highly correlated, then the mutual entropy should be high. The naive estimate will 
fail, making the utility of equation [5] more evident. 

Equation [5] was tested on series of two-dimensional potential surfaces consisting of a linear combination 

of 

U a (x, y) = 5x 2 (x - 45) 2 + 1.7 x 10 - V(l/ - 30) 2 (7) 

and 

U b (x, y) = 15 ln[(2x - 3y) 2 + 100] + 1.25 x 10~ 7 (2x + 3y) 2 (2x + 3y - 180) 2 (8) 
according to 

U{x, y, a) = aU a + (1 - a)U b (9) 

where U is in units of pN- nm and positions x and y are in units of nm. U a and U b both have minima 
at (0,0) and (45,30), and U a has additional minima at (0,30) and (45,0). Part c of figures Q] El andg]show 
the potential U(x, y, a) at a of 0, 0.30, 0.75, and 1.00, respectively. 

In U a (Fig. @t), x and y are completely independent, while in U b (Fig. [T]:), they are highly correlated. 
This is quantified by the mutual entropy (Fig. [5]>, which was calculated by assuming a Boltzmann distribution 
for x and y, f(x,y) = e~@ u<yX,y \ and numerically integrating equation [6] between —20 < x < 65 and 
—20 < y < 50 via adaptive Simpson quadrature. 

As expected, mutual entropy decreases as the potential becomes more like U a . The joint entropy, S xy = 
J f(x, y) dx dy, and the marginal entropies, S x = J f(x) dx and S y = J f(y) dy, are mostly stable as a 
function of a. These magnitude of these factors cannot be directly compared to the mutual entropy because 
they are offset by an arbitrary constant, whereas mutual entropy is invariant to scale. 
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Figure 1 : PMF reconstruction on the surface with a = from umbrella sampling, a. PMFs along the y 
coordinate. The blue line is the theoretical surface, the red asterisks are the biased reconstruction, and green 
dots are the naive reconstruction, b. Contour plot of the reconstructed potential, c. Contour plot of the 
theoretical potential. The scale for both b and c and indicated by the side color bar. d. PMFs along the x 
coordinate. Annotation is the same as in a. 
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Figure 2: PMF reconstruction on the surface with a = 0.3 from time-dependent biasing experiments. 
Annotation is the same as in Figure [TJ 
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Figure 3: PMF reconstruction on the surface with a = 0.75 from diffusion experiments. Annotation is the 
same as in Figure [Q except there are no WHAM reconstructions. 




Figure 4: PMF reconstruction on the surface with a = 1.0 from time-dependent biasing experiments. 
Annotation is the same as in Figure [TJ 
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Coupling parameter a 

Figure 5: Entropy as a function of the coupling parameter a. The mutual entropy, S mutU ai, is designated by 
connected squares. The joint entropy, S xy , and the marginal entropies, S x and S y , are labeled and marked 
by connected circles. 
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Figure 6: Root mean square error for y PMF estimates as a function of mutual entropy. Reconstruction 
errors for pulling experiments are indicated by squares and for umbrella sampling by circles. Errors from 
unbiased diffusion on the surface are indicated by diamonds. Errors in WHAM-reconstructed surfaces are 
connected by solid lines, while the naive estimate errors are not connected. 

5 Simulations on a Two-Dimensional Energy Surface 

Brownian dynamics simulations 19 were run on the series of potential surfaces. The progress of a dynamical 
variable is given by Ax(t) = F(x(t ~ 1 j )gAt + (2DAt) 1 / 2 R, where F(x) = ~ dU d ^ ,y) is the force, T is 
the temperature (300 K), D is the diffusion constant (1200 nm 2 /s), At is the time step (1 ms), and R is a 
normally distributed random variable. 

After 1000 steps of equilibration, umbrella sampling and pulling simulations were run for 100 itera- 
tions of 10000 time steps. In addition, 10 unbiased diffusion simulations were run for 100000 time steps. 
Umbrella sampling was performed with a harmonic bias, V(x(r)) = l/2k s (b — x(r)) 2 , setting the spring 
constant, k s , at O.lpN/nm and the bias center, b, at one of 100 evenly spaced positions between -5 and 50 
nm. Pulling simulations were performed with the time-dependent perturbation, V[x(r), t] = \/2k s (b{t) — 
x{r)) 2 , with k s set to 0ApN/nm. A periodic biasing program, 20 b(t) = 22.5 - 27.5 cos(27rt/10000), 
which has one period per trajectory, was used. Accumulated work was numerically integrated by W t = 
X)f=i ~~ ks[(xj + Xj-\)/2 — (bj + bj^\)/2](bj — foj-i), where bj is the position of the bias center at time 



Theoretical surfaces, U(x) and U(y), were calculated by assuming a Boltzmann distribution, f(x, y) = 
e -f3U(x,y)^ an( j num erically integrating f(x, y) over the other coordinate (with limits of —20 < x < 65 and 
—20 < y < 50) using adaptive Simpson quadrature to obtain a marginal probability density. PMFs are then 
obtained by taking natural logorithms and multiplying by —k^T. In time-dependent biasing experiments, 
reconstructed surfaces for x were calculated using equation [3l two-dimensional surfaces from equation \5\ 
and surfaces along y by integrating over x; from the estimate of f(x, y), the sum is taken over x to yield an 
unnormalized f(y), from which U(y) = — kbT In f(y) is calculated. PMFs from umbrella sampling were 
calculated as described by Roux. 10 Minima of all PMFs were set to zero. Errors were calculated by the 
root mean square deviation from the theoretical surface at 50 points between —10.9091 < x < 55.9091 and 
7.2727 < y < 37.2727, ranges which include the zeroes of U a and U^. The root mean square error (RMSE) 



6 Results and Discussion 

When x and y are completely independent, the naive estimate for F(y) is as good as the estimate from 
equation [5] (Fig. Hk). If x and y are highly correlated, however, the naive estimate fails and a WHAM- 
informed reconstruction method is necessary (Fig. [TJ. There is no simple dependence, however, between 
error and mutual entropy (Fig. ©, as other factors of the free energy surface come into play. 



^3=1 

step j. 



along y is defined as J i Y,i=i[ F theoreticai{yi) ~ F reconstructed (yi)] 2 
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Figure 7: Root mean square error for y PMF estimates as a function of number of simulations, n. Error 
bars indicate the mean and standard deviations of RMSE from five reconstructions using n simulations. 
Reconstruction errors for pulling experiments are indicated by squares and for umbrella sampling by circles. 
Unbiased simulations were run for 10 times as long and are therefore spaced 10 times farther apart on the x 
axis. 

For example, due to the highly diffusive nature of the dynamics and the large timestep necessary for fast 
sampling, sampling is erroneously improved near the boundaries of small low-energy wells. This leads to 
systematic free energy underestimates in all simulations, even those without any external bias (i.e. Fig. [3]). 
Although there are errors in WHAM-informed reconstructions from simulations with external bias, they are 
no worse than errors from long simulations with free diffusion (Fig. [6]>. 

On certain free energy surfaces, phase space sampling along the y dimension may be limited by trapping 
in local minima. This is especially true if x and y are relatively independent and biasing along x does not 
enhance conformational sampling along y. Sampling is generally limited in high-energy regions of phase 
space (see Figs. [H [2j El and @]). In these regions, the most accurate two-dimensional PMF reconstruction 
would require biasing along both x and y, which can be accounted for by a simple modification of equation 
[5] In general, accurate PMF reconstruction in higher dimensions may require biasing along coordinates that 
closely correspond to these dimensions. Close correspondence may be indicated by high mutual entropy; 
this aspect of analysis will be left to further study. 

On the toy two-dimensional surfaces studied, PMF reconstructions along y using equation [5] actually 
converge more quickly than equilibrium umbrella sampling and reconstruction from unbiased experiments 
(Fig. U}- While time-dependent biasing experiments will sample the whole span of the bias coordinate de 
facto, umbrella sampling will inherently require a certain number of simulations at different bias centers 
to attain this broad range. Similarly, unbiased simulations will need to be run for longer in order to reach 
more distal regions of phase space. Generally, it is preferable to run a greater number of nonequilibrium 
experiments than a single long equilibrium experiment because the outcome of each individual trajectory 
can be highly dependent on initial conditions. These sampling problems described on two-dimensional 
surfaces are likely to be enhanced in systems with higher complexity. The main caveat with this method, as 
with all methods based on the nonequilibrium work relation, is that with greater free energy differences, the 
likelihood of observing a trajectories with negative dissipative work (work that is less than the free energy 
difference) is increasingly unlikely. 

Pulling experiments are poised to find broader utility in studies of nanoscale systems, particularly when 
free energy differences between states are not prohibitively large. I hope this work will find wide application 
in computational and laboratory experiments and stimulate further research in the area. MATLAB source 
code to perform the described analysis is available at http://mccammon.ucsd.edu/~dminh/software/ 
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